Using Fractional Factorial Designs for Variable Importance in Random Forest Models

نویسندگان

  • Ewa. M. Sztendur
  • Neil T. Diamond
چکیده

Random Forests are a powerful classification technique, consisting of a collection of decision trees. One useful feature of Random Forests is the ability to determine the importance of each variable in predicting the outcome. This is done by permuting each variable and computing the change in prediction accuracy before and after the permutation. This variable importance calculation is similar to a one-factor-at a time experiment and therefore is inefficient. In this paper, we use a regular fractional factorial design to determine which variables to permute. Based on the results of the trials in the experiment, we calculate the individual importance of the variables, with improved precision over the standard method. The method is illustrated with a study of student attrition at Monash University. Keywords—Random Forests, Variable Importance, Fractional Factorial Designs, Student Attrition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Fractional Factorial Split-plot Designs for Model Selection

Fractional factorial designs are used widely in screening experiments to identify significant effects. It is not always possible to perform the trials in a complete random order and hence, fractional factorial split-plot designs arise. In order to identify optimal fractional factorial split-plot designs in this setting, the Hellinger distance criterion (Bingham and Chipman (2007)) is adapted. T...

متن کامل

Generalized Resolution and Minimum Aberration for Nonregular Fractional Factorial Designs

Seeking the optimal design with a given number of runs is a main problem in fractional factorial designs(FFDs). Resolution of a design is the most widely usage criterion, which is introduced by Box and Hunter(1961), used to be employed to regular FFDs. The resolution criterion is extended to non-regular FFG, called the generalized resolution criterion. This criterion is providing the idea of ge...

متن کامل

Factorial and Fractional Factorial Designs with Randomization Restrictions - a Projective Geometric Approach

Two-level factorial and fractional factorial designs have played a prominent role in the theory and practice of experimental design. Though commonly used in industrial experiments to identify the significant effects, it is often undesirable to perform the trials of a factorial design (or, fractional factorial design) in a completely random order. Instead, restrictions are imposed on the randomi...

متن کامل

Fitting Second-order Models to Mixed Two-level and Four-level Factorial Designs: Is There an Easier Procedure?

Fitting response surface models is usually carried out using statistical packages to solve complicated equations in order to produce the estimates of the model coefficients. This paper proposes a new procedure for fitting response surface models to mixed two-level and four-level factorial designs. New and easier formulae are suggested to calculate the linear, quadratic and the interaction coeff...

متن کامل

To save our environment

An algorithmic approach to constructing mixed-level orthogonal and near-orthogonal arrays Quasi Monte Carlo simulations of Brownian sheet with an application to interest rate stochastic string model A method for generating uniformly scattered points on the Lp-norm unit sphere and its application in statistical simulation A new class of model-robust designs A method for screening active effects ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013